Morphological Segmentation for Keyword Spotting

نویسندگان

  • Karthik Narasimhan
  • Damianos Karakos
  • Richard M. Schwartz
  • Stavros Tsakalidis
  • Regina Barzilay
چکیده

• We explore the impact of morphological segmentation on Keyword Spotting (KWS). ! • Handling out-of-vocabulary (OOV) words is a major challenge in KWS we aim to alleviate this problem by utilizing sub-word units.! • We augment a state-of-the-art KWS system with subword units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. ! • Our experiments demonstrate that morphemes improve overall KWS performance, both individually and in combination with other sub-word units. 3. Segmentation Methods

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A probabilistic method for keyword retrieval in handwritten document images

Keyword retrieval in handwritten document images (word spotting) is very challenging given that OCR accuracy is not yet adequate for handwritten scripts, specially with large lexicons. Various proposed approaches build indices on information such as image features or OCR scores and have improved the performance of the traditional approach that builds index on OCR’ed text. In this paper, we impr...

متن کامل

Zone-based Keyword Spotting in Bangla and Devanagari Documents

In this paper we present a word spotting system in text lines for offline Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown that zone-wise recognition method improves the word recognition performance than conventional full word recognition system in Indic scripts [29]. Inspired with this idea we consider the zone segmentation approach and use middle zone information ...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Keyword Spotting on Korean Document Images by Matching the Keyword Image

In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. ...

متن کامل

Coalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting

In agglutinative languages like Korean, words are formed by joining l affix morphemes to the stem, which leads to high OOV rate in dictionary building. Hence, subword units are usually used as basic language modeling units in Large-Vocabulary Continuous Speech Recognition (LVCSR) or LVCSR based applications such as keyword spotting. In this work, firstly a new word property called coalescence t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014